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Abstract 

The spectral function py_^(s) is determined from ALEPH and OPAL data on hadronic 
tau decays using a neural network parametrization trained to retain the full experimental 
information on errors, their correlations and chiral sum rules: the DMO sum rule, the first 
and second Weinberg sum rules and the electromagnetic mass splitting of the pion sum rule. 
Nonperturbative QCD vacuum condensates can then be determined from finite energy sum 
rules. Our method minimizes all sources of theoretical uncertainty and bias producing an 
estimate of the condensates which is independent of the specific finite energy sum rule used. 
The results for the central values of the condensates (Oq) and (Og) are both negative. 
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1 Introduction 



As the predictions of QCD become increasingly precise and more high quality data is avail- 
able, the theoretical uncertainties associated with the analysis of the data are often found to 
be dominant and thus have come under increasing scrutiny. In this work we continue pre- 
vious efforts [1,2] to optimize the analysis of the information contained in the experimental 
data, taking into account errors and correlations, while introducing the smallest possible the- 
oretical bias. We consider the determination of the QCD vacuum condensates (O e ) , (0 8 ) and 
higher dimensional condensates, which in principle can be extracted in a theoretically solid and 
experimentally clean way from the hadronic decays of the tau lepton. In practice, however, 
the situation is far from satisfactory, as revealed by the lack of stability of the value of the 
nonperturbative condensates obtained from this process by different procedures. 

The main source of difficulties can be traced to the fact that conventional extractions of 
the QCD vacuum condensates from hadronic tau decays involve convolutions of the difference 
Vi(s) — ai(s) of the isovector vector and axial vector spectral functions, which is a purely 
nonperturbative quantity, that does not converge to the perturbative result (these spectral 
functions are degenerate within perturbation theory) at energies s < M% and moreover has large 
uncertainties in the high s region. To obtain a reliable value for the nonperturbative condensate, 
some convergence method must be applied, implying that the error on the condensates gets 
tangled with uncertainties within the theoretical assumptions and subject therefore to a variety 
of sources of theoretical bias. This is a consequence of the fact that the kinematics of the 
hadronic tau decays constrains the range of energies in which we can evaluate the spectral 
functions. The main difficulty is that any method to estimate non-perturbative condensates 
exploits the shape of the spectral functions near and beyond the boundary of the region where 
the data is available. Final results are then subject to systematics errors associated to the 
extrapolation of data as well as the way global theoretical constraints, e. g. Weinberg sum 
rules, are imposed. 

In this paper, we approach the determination of nonperturbative condensates, in a way 
which tries to bypass these difficulties, by combining two techniques. First, a novel bona fide 
method to take into account experimental errors was proposed and implemented in the context 
of analysis of Deep Inelastic Scattering data to produce a probability measure in the space of 
deep inelastic structure functions by means of neural networks [3]. Here, we adapt this method 
to the parametrization of spectral functions. The second technique we use refines the training 
of neural networks so as to implement the constraints that represent the QCD chiral sum rules 
in our neural network parametrization of the spectral function t>i(s) — ai(s). 

The representation of the probability density given in Ref. [3] takes the form of a set of 
neural networks, trained on an ensemble of Monte Carlo replicas of the experimental data, 
which reproduce their probability distribution. The parametrization is unbiased in the sense 
that neural networks do not rely on the choice of an specific functional form, and it interpolates 
between data points, imposing smoothness constraints in a controllable way. Information on 
experimental errors and correlations is incorporated in the Monte Carlo sample. Errors on 
physical quantities and correlations between them can then be determined without the need of 
linearized error propagation. Our final parametrization combines all the available experimental 
information, as well as constraints from different convolutions of the data, i.e. chiral sum rules, 
must verify. In this way statistical errors can be estimated and the loss of accuracy due to the 
extrapolation outside the kinematical region where the data is available is also analyzed. 
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Hence, we can obtain a determination of the QCD vacuum condensates which is unbiased 
with respect to the parametrization of the spectral function and the error and correlations 
propagation. We also try to keep under control all sources of uncertainty related to the method 
of analysis, and estimate their contribution to the total error. This gives us a determination 
of the nonperturbative condensates, and simultaneously illustrates the power of a method of 
analysis based on direct knowledge of a probability density in a space of functions. 

This paper is organized as follows: in section 2 we review theoretical tools used in the anal- 
ysis of non-perturbative effects in hadronic tau decays; in section 3 we present the experimental 
data that is used in our analysis and in section 4 we introduce the neural network parametriza- 
tion of spectral functions. Section 5 contains the details and results of our extractions of the 
vacuum condensates: we explain our choice of training parameters, our error estimations and 
the consistency tests that we performed; finally section 6 summarizes our conclusions. 

2 Spectral functions and hadronic tau decays 

The tau particle is the only lepton massive enough to decay into hadrons. Already before its 
discovery, it was predicted to be important for the study of hadronic physics [4] , a study that 
has been performed extensively at the LEP accelerator. Its semileptonic decays are therefore 
an ideal tool for studying the hadronic weak currents under clean conditions, both theoretically 
and experimentally, thanks to the high quality data from LEP. In this first section we will briefly 
introduce the theoretical foundations that form the basis of the QCD analysis of hadronic tau 
decays. There exists a huge literature on tau hadronic physics to which the interested reader 
is directed (see for instance Ref. [5] and references therein). 

Spectral functions are the observables that give access to the inner structure of hadronic tau 
decays. As parity is maximally violated in r decays, the spectral functions will have both vector 
and axial vector contributions. As spontaneous chiral symmetry breaking is a nonperturbative 
phenomena, this spectral functions are degenerate in perturbative QCD with massless light 
quarks, so any difference between vector and axial-vector spectral functions is necessarily gen- 
erated by non-perturbative dynamics, that is, long distances resonance phenomena, being the 
most relevant the p(770) and the ai(1260) in the vector and axial vector channels respectively. 
Therefore, the difference of these spectral functions is generated entirely from nonperturbative 
QCD dynamics, and provides a laboratory for the study of these perturbative contributions, 
which have resulted to be small and therefore difficult to measure in other processes where the 
perturbative contribution dominates. An accurate extraction of the QCD vacuum condensates 
is not only important by itself but also has many important phenomenological applications, for 
example in the evaluation of matrix elements in weak decays [6]. 

The ALEPH collaboration at LEP measured [7, 8] these spectral functions from hadronic 
tau decays with great accuracy, providing an excellent source of precision analysis of nonper- 
turbative effects. As it is well known [10], the basis of the comparison of theory with data is 
the fact that unitarity and analyticity connect the spectral functions of hadronic tau decays to 
the imaginary part of the hadronic vacuum polarization, 

n&,(?) = J d 4 x e** (0|T (C^(x)t^.(0)t) |o) , (1) 

of vector = V^ — q^q-i or axial vector = = q^^^qi color singlet quark currents in 
corresponding quantum states. After Lorentz decomposition is used to separate the correlation 
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function into its J = 1 and J = components, 

n&(?) = (-<TY + <f<t) n%( q 2 ) + ^ng^ 2 ) , (2 ) 

for non-strange quark currents one identifies 

lmU% A (s) = ^/a^s) . (3) 

This relation allows us to implement all the technology of QCD vacuum correlation functions 
to hadronic tau decays, and provides the basis of the comparison of theory with data. 

The basic tool to study in a systematic way the power corrections introduced by nonper- 
turbative dynamics is the operator product expansion. Since the approach of Ref. [13], the 
operator product expansion (OPE) has been used to perform calculations with QCD on the 
ambivalent energy regions where nonperturbative effects come into play but still perturbative 
QCD is relevant. In general, the OPE of a two point correlation function I[( J )(s) takes the 
form [10] 

n (J) (*)= E jzTm £ cv\ s ,n)(o(n)) , (4) 

D=0,2,4,... I b ) dimO=D 

where the arbitrary parameter jj, separates the long distance nonperturbative effects absorbed 
into the vacuum expectation elements ((9(/z)), from the short distance effects which are included 
in the Wilson coefficient C^ J \s,fi). The operator of dimension D = is the unit operator 
(perturbative series), and we are interested in the dimension D > 6 operators. What is relevant 
for us is that D = 6 is the first non-vanishing non-perturbative contribution, in the limit of 
massless light quarks, to the Vi(s) — ai(s) spectral function and, moreover, it has been shown 
to be the dominant one. This dominant contribution carries non-trivial four-quark dynamical 
effects of the form qiTiqjq k T 2 qi- Additional contributions from a mixed quark gluon condensate 
as well as a triple gluon condensate are assumed to be small. Therefore, this spectral function 
should provide a source for a clean extraction of the value of the nonperturbative contributions. 

Finally, we review the important paper that QCD chiral sum rules play in the analysis of 
this process. Sum rules have always been an important tool for studies of non-perturbative 
aspects of QCD, and have been applied to a wide variety of processes, from Deep Inelastic 
Scattering to Heavy Quark systems [14], [15]. Now we will review one of the classical examples 
of low energy QCD sum rules, the chiral sum rules. The application of chiral symmetry together 
with the optical theorem leads to low energy sum rules involving the difference of vector and 
axial vector spectral functions, 

Pv-a(s) = ^i(s) - ai(s) . (5) 

These sum rules are dispersion relations between real and absorptive parts of a two point 
correlation function that transforms symmetrically under SU(2) L <g> SU(2) R in the case of non 
strange currents. Corresponding integrals are the Das-Mathur-Okubo sum rule [18] 



t- / ds-p v _ A {s) = 7 » V - F A , (6) 

47T JO S 3 



r-' ^ 

I ds — Dir a($) = - 

47T 

as well as the first and second Weinberg sum rules (WSR) [19] 



Ait 2 Jo 



dsp v _ A (s) = f w , (7) 
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/ dssp v „ A (s) = , (8) 
J o 

where in eq. (JJJ) the RHS term comes from the integration of the spin zero axial contribution, 
which for massless non-strange quark currents consists exclusively of the pion pole. Finally, 
there is the chiral sum rule associated with the electromagnetic splitting of the pion masses [20], 

4^ Jo d ^ln-p v _ A (s) = -^L(ml ± -m n0 ) , (9) 

where f n = (92.4 ± 0.3) MeV [12] obtained from the decays n~ — > p^v^ and n~ — > fjTi/f/y, 
Fa = 0.0058 ± 0.00008 is the pion axial vector form factor 1 obtained from the radiative decays 
7r~ — > l~vf) and (r 2 ) = (0.439 ± 0.008) fm 2 is the pion charge radius squared. From now on 
these four chiral sum rules will be denoted by SRI, SR2, SR3 and SR4 respectively. It could 
be argued that as long that these chiral sum rules are taken in the chiral limit, the value of the 
pion decay constant should be the chiral limit value, / ~ 0.94/V [16]. However, as long as the 
experimental data consists of real pions, we consider that it is more reasonable to use the real 
world value for the pion decay constant. 

When switching quark masses on, only the first WSR remains valid while the second breaks 
down due to contributions from the difference of non-conserved vector and axial vector currents 
of order mi/ s leading to a quadratic divergence of the integral. This is not numerically relevant 
in our analysis because we deal with finite energy sum rules, and in this case the contribution 
from non-zero quark masses is negligible. 

The QCD vacuum condensates can be determined by virtue of the dispersion relation from 
another sum rule, that is, a convolution of the pv-a(s) spectral function with an appropriate 
weight function. Let us define the operator product expansion of the chiral correlator in the 
following way 

OO 1 OO 1 

m 2 )W-A = E ^i^n +4 (Q 2 ,/i 2 ) (eW/i 2 )) = E (°^) ■ ( 10 ) 

n=l t» n=l t» 

The Wilson coefficients, including radiative corrections, are absorbed into the nonperturbative 
vacuum expectation values, to facilitate comparison with the current literature. The analytic 
structure of the II is subject to the dispersion relation 

poo I I 

IL/_a(Q 2 ) = / ds——-lmU v . A (s) . (11) 

JO S + Q z 7T 

Condensates of arbitrary dimension are simply given by 

(0 2n+2 ) = (-irJ°°dss n ^(v 1 (s)-a 1 (s)) , n>2, (12) 

which, if the asymptotic regime has reached, should be independent of the upper integration 
limit for large enough s - As can be seen from the experimental data, errors in the large s 
region are very important, so large errors are expected in the evaluation of the condensates. 
The analysis of these sources of errors is one of our main goals in the present analysis, which 
will be obtained thanks to the natural capability of neural networks of smooth interpolating 
while implementing all the experimental information on errors and correlations. 

1 Note that our definition of Fa agrees with that of Ref. [7] but differs by a factor of 1/2 from that given in 
Ref. [12] 
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2.1 Finite energy sum rules 

As long as all previous integrals have to be cut at some finite energy sq < M 2 , since no experi- 
mental information on vi(s) — ai(s) is available above we must perform a truncation that 
competes with all other sources of statistical and systematic errors, introducing a theoretical 
bias which is difficult to estimate. Many techniques have been developed to deal with this finite 
energy integrals, leading to the so-called Finite Energy Sum Rules (FESR). The paradigmatic 
example is the calculation of spectral moments [21], that is weighted integrals over spectral 
functions. Choosing appropriate weights allows to extract the maximum information possi- 
ble from the experimental data while minimizing the contribution from the region with larger 
errors. This techniques allow a comparison of the same quantity evaluated on one side with 
experimental data and on the other side with theoretical input, basically the Operator Product 
Expansion with perturbative QCD corrections. The general expression that takes advantage of 
the analyticity properties of the chiral correlators is given by 

f ° ds W(s) Im4 J 2 A (s) = 1 / ds W(s) U$_ A (s) , (13) 

JO ll J|s|=s 

where W(s) is an analytic function and so is large enough for the OPE series to converge. The 
LHS of eq. (fTSj) can be evaluated using the experimental input from spectral functions as deter- 
mined in hadronic tau decays, while the RHS can be evaluated using the OPE representation 
of the chiral correlator. Finally, a fit is performed to extract the OPE parameters from the 
experimental data on spectral functions. 

A common hypothesis in the majority of this kind of analysis is that the difference of the 
OPE representation for the chiral correlator from the full expression, 

R{s , W] = ^- f ds (U v _ At0 PE(s) - TIv-a(s)) W{s) , (14) 

2,711 J\s\=s 

can be neglected. This quantity is a measure of the OPE breakdown, also known as duality 
violation 2 . It is necessary to take into account that Hv-a,ope foils & f least in some region of 
the integration contour. This was shown in Ref. [17], where it was demonstrated that the OPE 
representation breaks down near the timelike real s axis for insufficiently large Sq. The neglect 
of the duality violation component of the OPE is a key dynamical assumption and there exists 
several strategies to minimize its impact, as working with duality points [28] or using pinched 
Finite Energy Sum Rules [29], with polynomial weights that vanish at the upper integration 
limit. All these techniques yield different although compatible values for the (Oq), whereas 
non-compatible results are obtained for higher condensates. 

Other types of finite energy sum rules have been used to extract the values of the condensates 
and other phenomenologically relevant related quantities. Borel sum rules and Gaussian sum 
rules [24] take advantage of certain combination of the condensates that theoretically optimize 
the accuracy of the extraction. Inverse moment sum rules [23] techniques make a connection 
between the phenomenological parameters of the QCD effective Lagrangian and the nonper- 
turbative condensates. In section 5 we will compare our extraction of the condensates to those 
obtained with all these methods and argue why ours has a reasonable control of the different 
theoretical uncertainties. 

2 For a review of the current theoretical status of the quark-hadron duality violations, see Ref. [25] 
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Our approach is different with respect to previous determinations of nonperturbative con- 
densates. First of all, we use the smooth interpolation of the neural network to extend the range 
of integration, so that our determination of the condensates corresponds to the asymptotic en- 
ergy region sq — > oo. The second point is that the training method allows the incorporation 
of the chiral sum rules to the neural network parametrization of the spectral function, and 
therefore the specific weight used to determine the condensate turns out not to be relevant: 
different choices of weights differ by chiral sum rules that are already verified our parametriza- 
tion. Therefore, the final results for the non-perturbative condensates that emerge from the 
neural network parametrization of the spectral function v\(s) — Oi(s) are determined by 

(0 2n+2 ) = (-1)" J" ' dssH ^ (Ms) - oi(s)) n > 2 . (15) 

In the next sections will be argued why this choice is the most reasonable one, showing that all 
relevant constraints are verified. 



3 Experimental data 

Since the relevant spectral function for the determination of the condensates is the Vi(s) — eii(s) 
spectral function, we need a simultaneous measurement of the vector and axial-vector spectral 
functions. Data from the ALEPH Collaboration [7], [8] and from the OPAL collaboration [9] 
will be used, which provide a simultaneous determination of the vector and axial vector spectral 
functions in the same kinematic region and also provide the full set of correlated uncertainties 
for these measurements. Although the ALEPH data is of a higher quality due to the smaller 
errors, see fig. (1), the input from OPAL is complementary and will provide a cross check for 
our extractions of the nonperturbative condensates. There exists additional data on spectral 
functions coming from electron-positron annihilation, but their quality is lower than the data 
from hadronic tau decays and will be ignored here. 

ALEPH experimental data consists on the invariant square-mass spectra for both the vec- 
tor+axial vector and vector-axial vector components, that are related to the spectral functions 
by a kinematic factor and a branching ratio normalization 



t>i(s)/ai( 



Ml 



B(r~ -> V-jA-Vr) dN* 



V/A 



Q\Vud\ 2 S 



EW 



B t- 



N v/A ds 



J_\( _2s_ 



(16) 



Altogether our parametrization is based on N^ a t = 61 experimental points for ALEPH, 
although the full experimental data consists in 70 points uniformly distributed between and 
3.5 GeV 2 , because only points with s < M 2 = 3.16 GeV 2 are physically meaningful, and 
before this kinematic threshold is reached the invariant mass-squared spectrum vanishes due 
to phase-space suppression. For OPAL the data sample is a bit larger, Ndat = 97, with the 
same restrictions as in the ALEPH case. Henceforth, Py^li wn l denote the z-th data point 
Pv-A^Si) = V\(si) —a\(si). Figure (J3j) shows the experimental data used together with diagonal 
errors. 

Note that errors are small in the low and middle s regions and that they become larger 
as we approach the tau mass threshold. The last points are almost zero in the invariant mass 
spectrum, and are only enhanced in the spectral functions due to the large kinematic factor for 
s near M 2 , so special care must be taken with the physical relevance of these points. 
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Figure 1: 

Experimental data for v\(s) — a\(s) spectral function from the ALEPH (left) and OPAL 
(right) collaborations. Note that the errors are smaller in the ALEPH data but OPAL central 
values are nearer to the expected zero value at large s. 



It is clear that the vanishing of the spectral function is not reached for s < and must 
be enforced artificially on the parametrization we are constructing, that is we must device a 
technique to impose the asymptotic constraint that at high s this spectral function vanishes. 
The method we use takes advantage of the smooth, unbiased interpolation capability of the 
neural network: artificial points are added to the data set with adjusted errors in a region 
where s is high enough that the pv~a{s) spectral function should vanish. Once these artificial 
points are included, in a way to be discussed later, the neural network will smoothly interpolate 
between the real and artificial data points, also taking into account the constraints of the sum 
rules, as explained below. 

The experimental data points are highly correlated, because the majority of the covariance 
matrix is composed of nonzero entries, so it is therefore crucial to take into account all their 
correlations, which are specially relevant in the high s region. This is important because this 
region dominates the sum rule, eq. (H2J), that determines the vacuum condensates. As we 
shall discussed shortly, correlated errors are incorporated as a measure on the space of neural 
network parametrizations of the spectral functions using Monte Carlo statistical replicas of the 
experimental data. 

4 Neural network parametrization 

Ideally, a parametrization of spectral functions must incorporate all the information contained in 
the experimental measurements, i.e. their central values, their statistical and systematic errors 
and their correlations, furthermore, it must interpolate between them without introducing any 
bias. We will follow the method of Ref. [3], where an unbiased extraction of the probability 
measure in the space of structure functions of deep-inelastic scattering is performed, based on 
a coordinated use of Monte Carlo generation of data and neural network fits. 
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4.1 Probability measure in the space of spectral functions 

The experimental data gives us a probability measure in an N^at dimensional space, assumed 
to be multigaussian. In order to extract from it a parametrization of the desired structure 
function we must turn this measure into a measure V \pv-a\ in a space of functions. Once such 
a measure is constructed, the expectation value of any observable T can be found by 

computing the weighted average 

(F Ipv-a(s)}} = J Vp v ^ A T [pv-a(s)} V ip V - A } . (17) 

Errors and correlations can also be obtained from this measure, by considering higher moments 
of the same observable with respect to the probability distribution. 

The determination of an infinite-dimensional measure from a finite set of data points is an 
ill-posed problem, unless one introduces further assumptions. In the approach of Ref . [3] , neural 
networks are used as interpolating functions, so that the only assumption is the smoothness 
of the spectral function. Neural networks can fit any continuous function through a suitable 
training; smoother functions require a shorter training and less complex networks. Hence, 
an ideal degree of smoothness can be established on the basis of a purely statistical criterion 
without the need for further assumptions. 



4.2 Fitting strategy 

The construction of the probability measure is done in two steps: first, a set of Monte Carlo 
replicas of the original data is generated. This gives a representation of the probability density 
V [p] at points (sj) where data is available. Then a neural network is fitted to each replica. 
The ensemble of neural networks gives a representation of the probability density for all s: 
when interpolating between data the uncertainty will be kept under control by the smoothness 
constraint, but it will become increasingly more sizable when extrapolating away from the data 
region. 

The k — 1, . . . , N rep replicas of the data are generated as 

p ( r^=P { v% + r\ k) a^ (18) 

where Py X -\i = Pv-a{ s i) are the original data, Oi is the diagonal error, and rf^ are univariate 
gaussian random numbers whose correlation matrix equals that of the experimental data. The 

(k) 

fact that the correlation matrix of the r\ equals that of the experimental data is crucial to 
retain all the experimental information in our treatment. Then a set of N rep = 1000 replicas 
of this form is generated, and is verified that the central values, errors and correlations of the 
original experimental data are well reproduced by taking the relevant averages over a sample 
of this size. A explained above, the asymptotic constraint that Pv-a( s — * oo) = has been 
implemented by adding a number of artificial data points with adjusted errors. 

To verify that the central values, errors and correlations of the original experimental data 
are well reproduced, we can define statistical estimators that measure the deviations from the 
original correlations. A suitable one is the scatter correlation, which measure the deviations 
of the averages over the replica set from the original experimental values, and are defined as 
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Pv-a{s) 



N re p 


10 


100 


1000 


r[p V -A{s) ( 


art ^ 0.9803 


0.9997 


0.9998 


r [ a iprt) 


0.9894 


0.9992 


0.99994 




0.61 


0.955 


0.9956 



Table 1: Comparison between experimental and Monte Carlo generated artificial data for the 
Pv-a(s) spectral function. Note that the scatter correlation r, defined in eq. (JI9|L for N rep = 100 
is already very close to 1 for all statistical estimators. 

follows for the central value 

r Jart), \ ) /re P/dat X / dat \\ v I rep / dat , . 

r [PV-A\ - ( ar t) (exp) ' U ' ' 

(Ts (Ts 

where the scatter variances are defined by 



2 



n {OTt) 
u s 



\ 



and similarly for the diagonal errors and the correlations. In table we show the scatter 
correlations for the central values, the errors and the correlations. We observe that we need 
N rep = 100 replicas to maintain the correlations of the original data, which is the main purpose 
of our analysis. We have checked that increasing the number of training replicas does not 
decrease the errors in the extraction of the condensate further, meaning that we have reached 
a faithful representation of errors. 

Each set of generated data is fitted by an individual neural network. A neural network [32] , 
[31] is a function of a number of parameters, which fix the strength of the coupling between 
neurons and the threshold of activation of each neuron. The architecture of the network has 
been chosen to be 1-4-4-1, small enough to avoid overlearning and large enough to capture the 
non-linear structure of experimental data. The networks that we use are multilayer feed-forward 
neural networks constructed according to the following recursive relation 

d° = 9 (h?) , (22) 

hP^uStf-V-e®, (23) 
i=i 

where is the weight, the strength of the connection between two neurons, df are the 
thresholds of each neuron, is the activation state of each neuron and g is the activation 
function of the neurons. 



10 



We divide the training of the neural network in two epochs. In a first epoch, the train- 
ing method is done by backpropagation, where the parameters of the network are fitted by 
minimizing the error function: 

N dat (p^f - p^PY 

err (ex-pY v ' 

i=l Oi 

where p( ne *^ fe ' is the prediction of the i— th data point from the net trained on the k— th replica 
of the data. A more detailed review of neural networks learning techniques is presented in the 
appendix |XJ 

In a second training epoch, a different training technique called genetic algorithms training is 
used to implement the constraints from the sum rules. As explained below and in the appendix 
IA"| this technique allows us to implement in our training non-local constraints, as convolutions 
of the neural network output, with adjusted weights so that the chiral sum rules control the 
neural network interpolation in the data region where errors are greater. The error function 
eq. (|24|) is modified by adding a contribution proportional to the difference of the chiral sum 
rules, evaluated with the output of the trained networks, and their theoretical values, that is 

E tot = E cvv + E sr = E e „ + V w m / dsfi (p v _ A (s)) - AA , (25) 

where w BTi is the relative weight of each sum rule and A% is the theoretical value of the cor- 
responding sum rule, eqns. (0111) • We note that this definition introduces a new set of, in 
principle, arbitrary parameters, that is the relative weights of the sum rules. As explained 
below, these are determined by stability criteria, demanding that the contribution from E sr is 
similar to that of E eTT and that the final result is not sensitive to the specific values of these 
parameters. 

The basic idea of the genetic algorithms training, also known as natural selection training, 
works as follows. The training is divided in generations. For each generation the parameters of 
the network (weights and thresholds) are arranged to form a chain, called the ADN chain. This 
chain is replicated many times, creating a population of identical individuals. Later, random 
mutations are applied to each individual, where by mutation we mean a small change in one of 
the bits of his ADN chain. Then, the error function associated with each mutated individual 
is computed, which implies passing back the ADN bits to their original status of weights and 
thresholds and calculating the output of this new network. Only the best individuals are 
kept while discarding the rest, mimicking natural selection. This method provides a suitable 
technique to implement the effect of the chiral sum rules on our neural network training. Note 
that this technique leads to an important increase on the computing time, due to the fact that 
the chiral sum rules must be numerically evaluated many times each generation. 

The main advantage of genetic algorithms is that they allow neural networks to learn from 
error functions that may be as complicate as making impossible the use of backpropagation 
training. Furthermore, genetic algorithms can be proven to search efficiently the parameter 
space of solutions, exploring exponentially many more times reasonable outputs as compare to 
manifestly wrong ones. Genetic algorithms can also handle the training of very large neural 
networks. 

The parametrization obtained by means of the genetic algorithm training is represented in 
figure (2) where the output of the trained neural network, without and with the inclusion of the 
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Experimental ALEPH data h 
Trained net output (no chiral sum rules) 







0.5 1 



Experimental ALEPH data h 
Trained network output (with sum rules) 



0.5 1 



Figure 2: 

Output of the neural network trained over the experimental data, without chiral sum rules 
(left) and with chiral sum rules (right) incorporated in the training. Note that the effect of 
the chiral sum rules is that the network output reaches faster the asymptotic behavior of the 

spectral function. 



chiral sum rules, is compared with the experimental data points together with the corresponding 
statistical errors. It is clear that the effect of the chiral sum rules on the trained neural network 
output is forcing it to reach faster the asymptotic behavior of the spectral function pv-a(s). 

A common problem in the genetic algorithm learning techniques is getting stuck in a local 
minimum of the error function, far from the absolute minimum. In our training this difficulty 
has been bypassed by means of different simple modifications of the basic training procedure. 
First, within each generation large additional mutations are performed that allow the network 
configuration to escape from local minima. Secondly, as the training advances, the rate of 
the mutations decreases, allowing for way a better local learning. These modifications are 
instrumental to decrease the large duration of the training. 

4.3 Results and validation 

Once all the parameters of the training have been determined by stability criteria, an indepen- 
dent set of neural networks is trained on the spectral functions Vi(s) — ai(s). The length of the 
training is fixed by studying the behavior of the error function as defined in eq. ()24|) for 
the neural net fitted to the central experimental values, and asking that /Ndat stabilizes to 
a value close to one, which can be considered a good training. 

A number of checks is then performed in order to make sure that an unbiased representation 
of the probability density has been obtained. First, we have verified that the covariance of 
two data points computed from the Monte Carlo sample of nets is on average very close to 
the corresponding covariance matrix element of the data. Since correlations of the data are 
entirely due to systematic errors, this indicates that these errors are correctly reproduced. 
Statistical estimators are then constructed as in eq. (|19|) but now referred to the trained neural 
networks over the replicas, to explicitly verify that the training maintains all the experimental 
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Pv-a(s) 



N re p 


10 


100 


r[pv-A{s) ( 


ne ^\ 0.980 


0.981 




-0.21 


-0.20 




0.80 


0.85 



Table 2: Comparison between experimental and generated artificial data for the pv-a(s) spec- 
tral function. 

information. The scatter correlation is now defined as 

1 (exp) I (net) \ \ I (exp) \ I I (net) \ 

ir v —A\ (net) (exp) v ' 

O s O s 

and the corresponding values for the training of N rep = 100 replicas are presented in table El 
It is seen that the central values and the correlations are well reproduced, whereas this is not 
the case for the diagonal errors. 

The average standard deviation for each data point computed from the Monte Carlo sample 
of nets is substantially smaller than the experimental error. This is due to the fact that the 
network is combining the information from different data points by capturing and underlying 
law, or that it is introducing a smoothing bias. This effect is enhanced by the inclusion of sum 
rules constraints. All networks have to fulfill these constraints which forces the fit to behave 
smoothly in a region where experimental data are very large. This should be understood as a 
success of the fitting procedure. 

The final set of neural networks p[ net ^ k ^ provides a representation of the probability measure 
in the space of structure functions, which can be used to estimate any functional average, defined 
as in. eq (fTTj) using 

-i N re p 

(F [Pv-a(s)}) = — £ T [p { ylT\s)} . (27) 

In particular, the average and standard deviation of the nonperturbative condensates computed 
using the Monte Carlo sample will provide a determination of the central values and errors of 
these condensates. 



4.4 Details of the genetic algorithm training 

As explained above, in the second part of the training the chiral sum rules eqs. (jBKI) are 
incorporated to the error functional, eq. (j24j). These sum rules act as constraints on the 
neural network output, that is, the main contribution to the error function (which determines 
the learning of the network) still comes from the diagonal errors, and the sum rules are only 
relevant in the region where the errors are larger. The relative weights of the chiral sum rules 
will be chosen according to a stability analysis. The effect of including sum rules in the learning 
procedure is responsible for enforcing the desire vanishing oof the pv-a(s) spectral function, 
which is badly needed for a reliable extraction of the nonperturbative vacuum condensates. 

Obtaining maximum stability in our output is crucial for a proper parametrization of the 
spectral function. In the case of the relative weights of the chiral sum rules, we train the same 
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Figure 3: 

Dependence of the partial contributions to the error function on the relative weights of the SRI 
(left) and SR2 (right) chiral sum rules. Note that as expected for normalized sum rules, the 
stability region for the relative weight is close to 1. 

network with different relative weights and search for the region where both contributions to 
the error function, the contribution from the errors and the contribution from the sum rules, 
are comparable. This is shown in figure (3). The observed behavior is not surprising: for 
large relative weights the contribution from the sum rule is small because the training forces 
its learning but, as a consequence, the contribution from the experimental data increases. This 
behavior can be observed explicitely if we plot the evolution of the sum rules, as calculated 
with the network output of the trained network, as a function of their relative weights. We 
observe in figure (4) that, as the relative weight increases, the network output better verifies 
the corresponding chiral sum rule. 

Genetic algorithms thus allow to implement additional constraints in the training of neural 
networks in a smooth and efficient way. Its main drawback is the increase of the training time, 
because it is a random rather than a deterministic learning technique. In figure (j5J) we represent 
an example of the training of one replica. It can be observed a sharp transition when the sum 
rules constraints are introduced, but later the training forces the error function to stabilize to 
a situation similar to the initial training epoch. This sudden jump of the error function can 
be understood as follows: when the sum rules constraints are introduced, the training tends to 
verify them, causing that the net output does not follow the experimental values. Nevertheless, 
as generations go on, the net output begins to recover the original situation, while maintaining 
the verification of the sum rules constraints. When the number of generations is large, the error 
function approaches a value close to one, as is needed to keep systematic errors under control. 

A key issue in this procedure is to guarantee stability of results with respect to the relative 
weights of the chiral sum rules. In our training normalized sum rules are used, that is, if Aj is 
the theoretical value of the j— th chiral sum rule, the corresponding contribution to the error 
function will be 

w SIj (J°° daft (pv-a(s)) - A/)* / (Ajf , (28) 

therefore we expect the relative weights in the stability region of order 1. The only exception is 
the second Weinberg sum rule, eq. (|BJ) whose relative weight has to be determined demanding 
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Figure 4: 

Dependence of the value of the SR2 (left) and SR3 (right) chiral sum rules on their relative 
weights. It is also clear how for a value of the relative weights close to 1 the chiral sum rules 
are satisfied. 

stability of the network training, and that turns out to be around w ST3 = 10 _1 . 

Let us emphasize that the two Weinberg chiral sum rules are well verified by our neural 
network parametrization, and thus have been incorporated to the information contained on the 
experimental data. This fact will be crucial later because different extraction methods, differing 
in combinations of these chiral sum rules, can be shown to be equivalent in the asymptotic region 
so — > oo. In figure © the two Weinberg sum rules, eqs. (j7|8j) evaluated with the neural network 
parametrization of the spectral function pv~a{s) are represented. Both chiral sum rules are 
well verified in the asymptotic region, beyond the range of available experimental data. This 
also will ensure the stability of the evaluation of the condensates with respect to the specific 
value of s chosen as long as it stays in the asymptotic region. 

5 Determination of the nonperturbative condensates 

Using the neural parametrization of spectral functions, we can compute for each trained replica 
any given sum rule. Because the neural parametrization retains all the experimental information 
(it even allows for a determination of errors and correlations), we can view values coming 
from the neural networks as direct experimental determinations of convolutions of the spectral 
function pv-a(s). The value of the condensates (Oq) , (Og) and higher dimensional condensates 
is then extracted from the value of an appropriate sum rule, eq. (|12p. The method we will 
follow is the evaluation of the vacuum condensates as a function of the upper limit of integration 
for each replica and compute the mean and standard deviation. As has been explained before, 
it is crucial to represent the value of the different sum rules as a function of the upper limit of 
integration, to check both its convergence and its stability. 

Our method works as follows. First, we train a neural network on each replica. On a first 
training epoch we do not use the sum rules, so that the training can arrive to the best possible 
minimum. This is important when training neural networks because when further constraints 
to the training are added, as in our case the chiral sum rules, the goodness of the fit will 
be better if we start from a deep local minimum. On a second training epoch, we add to 
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Figure 5: 

Dependence of the different contributions to the error function on the length of the training. 
Note the sudden jump when the sum rules are incorporated to the training, and how the network 
later return to a configuration similar to the initial one. 

the fitness the contribution from the sum rules, where the relative weights are chosen so that 
the sum rules never represent more than the contribution from the experimental errors to the 
total fitness. Then, sum rules act as a smooth constraint on the network training, being more 
relevant in the regions with larger errors and thus enforcing the asymptotic vanishing of the 
spectral function. This technique prevents the contribution of the chiral sum rules to become 
so strong that overcome the experimental data with the corresponding errors. 

5.1 Central values 

The first criterion to judge the reliability of a QCD sum rule is its independence, at large values 
of s from the value of the upper integration limit, that its, its saturation. We then need to 
explore the values for the final condensates which are stable against the limit of integration of 
the sum rule. This stability criterium is completed with demanding independence of the results 
on the specific polynomial entering the sum rule. Further criteria are stability with respect to 
the precise artificial endpoints added to the data and with respect to the relative weights in 
the error function used to train the neural networks. 

Stable results are obtained for the dimension six condensate (C 6 ) whereas higher conden- 
sates e. g. (O s ) are less stable. Fig. [T] shows the outcome for (Oq) and (O s ) including the 
propagation of statistical errors. It is clearly seen that convergence in the limit of integration 
So is obtained due to the addition of sum rules and endpoints in the learning procedure. The 
central values for the condensates in the asymptotic limit come out to be: s — > oo: 

(0 6 ) = -4.2 10 -3 GeV 6 , 

(O s ) = -12.7 1(T 3 GeV 8 . (29) 
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Figure 6: 

Weinberg chiral sum rules, SR2 (left) and SR3 (right), evaluated on the neural network 
parametrization of Pv~a(s) 

The value of the (O e ) is a cross check of the validity of our treatment: not only there are 
strong theoretical arguments that support the fact that (Oq) is negative [33], [35] but also all 
previous determinations with different techniques yield negative results, being the majority of 
them compatible with ours within errors. 

We note that our evaluation of the condensates is compatible with some of our previous 
evaluations and has a similar error. This is though misleading as the error quoted here is only 
statistical and a discussion on systematic errors is needed (and done below). We can also obtain 
values for the higher dimensional nonperturbative condensates: 

(do) = 7.8 1(T 2 GeV 10 , 
(0 12 ) = -2.6 KT 1 GeV 12 . (30) 

Although stability deteriorates as compared to the case of the lower dimensional condensates, 
these central values for the condensates are alternated in sign. 

5.2 Discussion of errors 

The discussion of the various sources of errors is crucial to our treatment. We enumerate and 
discuss then in turn: 

1. Statistical error propagation from the experimental covariance matrix. 

This is the best understood and treated error source in our analysis. As explained above, 
the neural network parametrization defines an unbiased probability measure in the space 
of spectral functions that provides a nonlinear error propagation. This source of error is 
kept under control by using the averages over Monte Carlo replica. The band for central 
values of the condensates allowed by this error propagation can be visualized in fig. (7). 
Numerically, the contribution to the experimental error (statistics and correlations) to 
the central values is 

(O e ) = (-4.2 ±1.1) 1(T 3 GeV 6 , 
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Figure 7: 

Condensates (0$) and (0 8 ) as a function of Sq. The error bands only include the propagation 

of experimental uncertainties. 



(O s ) = (-12.7 ±6.4) 1(T 3 GeV 8 , 
(O w ) = (7.8 ± 2.4) 1(T 2 GeV 10 , 
(0 12 ) = (-2.6 ± 0.8) KT 1 GeV 12 . (31) 

Note that the sign obtained for each condensate remains unaltered within error. 

2. Choice of the polynomial in the finite energy sum rule. 

In principle, there are potentially important systematic uncertainities coming from the 
method of extraction of the condensates. These are much more difficult to noted 
when looking through the extense available literature. The extraction of (Oq) turns out 
to be clean and its errors are essentially of statistical nature. The uncertainty increases 
with the dimension of the condensate. Let us elaborate further these statements. 

Consider the following convolutions 

f s ° 1 

Z 6a = / ds—s Pv-a(s) (32) 

JO 27T 



Z 6b = s / ds— 1 Pv-a(s) - f n s (33) 

JO Z7T V S / 

The second equation is only equivalent to the first if, for some so, both Weinberg sum 
rules are satisfied. Although experimental data on tau decays do not exactly saturate 
these sum rules, the neural network parametrization trained to obey all the sum rules 
showed that Weinberg sum rules can indeed be well verified in the asymptotic region. We 
should then expect that Z Ga and 



Jo ds ^ [ s2 - 2ss o) Pv-a(s) (34) 



yields similar results for (Oq) within errors, as can be seen in fig. (JSJ). The same applies 
to the dimension 8 condensate (Os), where we now define the following finite energy sum 
rules: 

Z 8a = - ds—s Pv-a(s) , (35) 
Jo 2vr 
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Figure 8: 

Extraction of (0$) and (Os) with different polynomials 

Z 8b = - ds— (s 3 - sSq) Pv-a(s) , (36) 

Zsc = -] ds— [s 3 + 3s 2 s) Pv-a(s) . (37) 

We conclude that the neural network parametrization of spectral functions properly 
trained to accomodate for all sum rules provides estimates for condensates which are 
independent of the choice of a specific finite energy sum rule. 

3. Dependence on the endpoints. 

Our fit implements the asymptotic constraint that pv-a{ s oo) = by adding artificial 
endpoints. It is then necessary to verify the degree of sensitivity of our ouput to the 
precise location of these endpoints. As shown if Fig. (J0J), the sign of the dimension six 
and eight condensates remains unaltered when endpoints range between 3.5 and 4 GeV 2 . 
We also observe relatively large, although compatible within errors, fluctuations of the 
central values. This effect may be related to the presence of small wiggles in the spectral 
function p V -a(s) for large s. The contribution of this source of uncertainty to (O s ) turns 
to dominate over statistical uncertainties and can be estimated to be 

- 2.5 1(T 2 < (O b ) < -5 1(T 3 (38) 

while for (Oq) is comparable to the uncertainty due to the statistical errors of the exper- 
imental data, that is, 

- 6 1(T 3 < (O q ) < -2 KT 3 . (39) 

4. Chiral sum rules. 

This turns out to be the main source of systematic errors for the dimension 6 condensate. 
Chiral sum rules can be forced to be fulfilled by adequate training to any desired degree of 
precision. This, though, introduces a large increase in the total error function, eq. (|2~5j). 
coming from the experimental error piece. It is then necessary to make an appropriate 
choice of relative weights between the error associated to experimental data and the error 
associated to the fulfillment of chiral sum rules. 
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Figure 9: 

Value of the condensates (Oq) (left) and (O s ) (right) as a function of the position s of the 
first artificial endpoint. Note that their sign remains unchanged and the presence of a stability 
region near s = 3.5 GeV 2 

We may advocate that the most appropiate relative weights for the normalized chiral 
sum rules are 0(1). This is due to the fact that the total error function jumps above 
1 when too large relative weights are considered, as seen in figs. (0 EJ). We have thus 
performed a multi-dimensional stability analysis searching for the relative weights that 
produce a minimum sensitive final result, supplemented with the condition that in any 
case the contribution from the experimental errors to the error function can be greater 
than 1. The most suitable relative weights for the chiral sum rules turn out to be 

w sr i = 1.0 w sr2 = 5 10 2 w sr3 = 0.3 Wsri = 1 10 2 . (40) 

This stability analysis shows that the sign of the central values of the condensates is 
not very sensitive to the relative weights for the chiral sum rules. The estimation of the 
error associated with this uncertainty leads to the following range of values for the lowest 
dimensional condensate 

- 2 10~ 3 < (C 6 ) < -6 10~ 3 GeV 6 . (41) 

For (Os) the statistical errors and the systematic error due to the position of the artificial 
endpoints turn out to dominate over this source of uncertainty. Similar estimates for the 
condensates of higher dimensions turn out not to be reliable, and therefore we present 
only the central values obtained in this analysis together with the statistical errors. 

5.3 Analysis of the sq = 1.5 GeV 2 duality point 

Some values of previous extraction of the condensates [27] are based on the existence of a 
duality point around Sq ~ 1.5 GeV 2 . Our neural network parametrization is such that the 
second Weinberg sum rules is indeed verified around this point. Consequently, the values of the 
condensates computed by truncating different finite energy sum rules at this point do agree, as 
can be seen in fig. ©. Nevertheless, as shown in fig. (jHJ), the value of the condensates at this 
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duality point is different than the asymptotic one. Both results share the same sign but not 
the same absolute magnitude for (Og), and with the opposite sign for (Og) (which is precisely 
the results that the authors in Ref. [27] obtain). 

These extractions using the first duality point can be justified by some resonance models 
of the hadronic spectrum [28], inspired in the large— Nq limit of QCD with the additinal as- 
sumption of the validity of the Minimal Hadronic Approximation (MHA), in which the spectral 
functions are saturated by the pion pole, the first axial vector and the first rhoo vector reso- 
nances. Experimentally, it is observed that at different duality points, even as low as 1.5 GeV 2 , 
there is a local quark-hadron duality, meaning that the OPE at the quark- gluon level and that 
evaluated with the entire resonance hadronic spectrum coincide. Whether this apparent duality 
point at so = 1.5 GeV 2 is an accident only due to the fullfillment of the second Weinberg sum 
rule, or it is really a consequence of the full QCD hadronic spectrum remains to be understood. 
What it is clear from our analysis is that the condensates evaluated at this first duality point 
are different to those obtained in the asymptotic regime sq — > oo, where the validity of the 
OPE is less questioned. 

5.4 Spectral functions and the electroweak penguin Qj 

As a byproduct of our analysis additional sum rules of the spectral function pv-A which are 
relevant to phenomenology can be estimated. As an example 3 , the sum rule 



will be considered, where is a arbitrary factorization scale that cancels in the computation 
of physical observables. In this analysis the value /i 2 = 2 GeV 2 will be used. Eq. P%|) is 
relevant to the evaluation of Im Ge, where Ge is one of the couplings of the low energy chiral 
Lagrangian describing \ AS\ = 1 transitions [26]. The importance of a precise determination of 
this coupling relies on the fact that Im Gg is one of the most important contributions to e' in 
the Standard Model. In turn, its value is dominated by the electroweak penguin contributions 
Qf and Qs, which explains why the data on spectral functions from hadronic tau decays is 
important in its determination. 

Following the same steps that lead to the determination of the vacuum condensates, the 
same procedure for the sum rule eq. (|42j) is repeated. The result that is obtained in the 
asymptotic limit s — > oo is 



as can be seen in figure (10). It should be noted that the present determination is in good 
agreement with that obtained in the original work 4 [26]. The quoted error only refers to the 
propagation of experimental uncertainties. 

3 In this section the work of Ref. [26] is followed, the reader is directed to this reference for definitions and 
notation 

4 Note however that a different normalization for the spectral correator is used. 




(42) 



A L r = (6.9 ± 1.6) 10~ 3 GeV 6 , 



(43) 
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Figure 10: 

Evaluation of eq. (J4*2~j) for diferent values of s . The error bands include the propagation of 

experimental uncertainties. 



5.5 Results and comparison with other determinations 

Our determination of the nonperturbative condensates including the statistical error coming 
from the experimental errors and correlations was given in eq. (|3ip. However, the error which 
dominates the determination of (Oq) comes from the relative weights of the chiral sum rules 
to be obeyed. We have performed a stability analysis on these relative weights that produces 
a final result: 

(0 6 ) = (-4.0 ± 2.0) 10~ 3 GeV 6 . (44) 

As explained above, for the dimension 8 condensate, the systematic error associated with the 
endpoint position is comparable to the statistical uncertainty, that combine to yield a value 

(£> 8 > = (-12 +£) 10- 3 GeV 8 . (45) 

For higher dimensional condensates it is much difficult to estimate the different sources of 
systematic uncertainties. We, then, quote the central values we obtained and their statistical 
error: 

(do) = (7.8 ±2.4) 10~ 2 GeV 10 , 
(C 12 ) = (-2.6 ± 0.8) HT 1 GeV 12 . (46) 

A similar analysis has been performed with the OPAL data, yielding equivalent results but with 
larger errors, due to the larger statistical uncertainties as compared to the ALEPH experimental 
data. The values of the QCD nonperturbative condensates have been previously extracted from 
the ALEPH and OPAL data, with different techniques and different results, as summarized in 
table El 

Note that our results agree, at least on the sign, with that of Refs. [26], [29], [30]. This is 
also true for the higher dimensional condensates of Ref . [22] , where the authors obtain: 

(do) = (4.8 ± 1.0) 10~ 2 GeV 10 , 



22 



I"? pfprpn pp 


(Ok) x 10 3 GeV ti 

VV-^fW /\ ±.\J VJ v~ V 


/0 fi \ x 10 3 GeV s 


Rpf [231 

J. LCI . 
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7 + 4 


Ref [261 


-3.2 ± 2.0 


-12.4 ± 9.0 


Ref. [27] 


-9.5 + 3 


16.2 + 5 


Ref. [29] 


-4.45 + 0.7 


-6.2 + 3.2 


Ref. [30] 


-4 + 1 


-1.2 + 6 


This work 


-4 ± 2 


-12 + 7 
- li 



Table 3: 

Previous extractions of the condensates ordered chronologically. Appropiate rescalings have 
been performed to allow the comparison of different extractions. 

(O u ) = (-1.6 ± 0.26) 10" 1 GeV 12 . (47) 

in agreement with eq. ()31|) , although the errors in our determination are somewhat larger. 

There are some differences between these previous determinations and the present one, 
eq. ()46jl . The first one is that we do not make any assumption on the values of the higher 
dimensional nonperturbative condensates. In many analysis the effect of (Od) for D > 10 is 
simply neglected to get closed expressions for the condensates. In our analysis, though, we 
do not need to make this hypothesis. Neither we need to assume that the chiral sum rules 
are verified, because the chiral sum rules enter as constraints in the genetic algorithm training 
(see fig. This is relevant because previous analysis showed that the chiral sum rules are 

not verified for s = M^, except the DMO sum rule, implying that one must be extremely 
careful when dealing with them. A second main difference is the absence of theoretical bias 
introduced in other analysis with a choice of a given finite energy sum rules. Moreover, the 
smooth interpolating capability of the neural network lets the integration range to be taken up 
to arbitrarily high energies. 

6 Conclusions 

We have presented a determination of the nonperturbative vacuum condensates (Or) and (0%) 
from the spectral functions from hadronic tau decays aimed at minimizing the sources of theo- 
retical bias which might be cause of concern in existing determinations of these condensates from 
spectral functions. This determination is based on a bias-free neural network parametrization 
of the V\(s) — a\(s) spectral function, inferred from the data, which retains all the information 
on experimental errors and correlations, and supplemented with the additional theoretical input 
of the chiral sum rules. 

Our final results give negative central values for the dimension 6 and 8 condensates. These 
results take into account the propagation of statistical errors and their correlations. Morevover, 
the main source of systematic error in our procedure is identified as the choice of relative weights 
assigned to chiral sum rules in the fitness function used to train the neural networks. In the 
case of the dimension 6 condensate a stability analysis can be performed. Higher dimension 
condensates carry larger errors, although the sign of the condensates seem to remain unaltered. 



23 



The sign of the dimension eight condensate (Og) deserves further comments. Our central 
value is negative within statistical errors but is sensitive to the position of artificial endpoints 
added to enforce the vanishing of the spectral function for large values of s. This produces a 
systematic bias as possible wiggles of the spectral function around the asymptotic zero value are 
suppreseed. Those wiggles may indeed produce a change of sign of (Og). This is not the case in 
our approach as the smoothness of the neural network tends to avoid such wiggles, which might 
lead to a systematic error. Although our results seem to point in the same direction of other 
recent previous extractions of (Og) we consider the issue of the sign of the vacuum condensate 
(Og) to remain open. 

Another result of this work is the implementation of a technique based on genetic algorithm 
neural network training, which extends the capabilities of neural network data analysis allowing 
to incorporate non-local constraints like convolutions in the training. This technique extends 
previous efforts [3] oriented to the improvement of high energy physics data analysis, specially 
for the strong interaction sector. 
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A Neural network techniques 



In this section a brief review of the neural network training algorithms that have been used in 
the this analysis is presented. Two different learning algorithms have been used for the neural 
network training: in a first epoch the only contribution to the error function comes from the 
statistical experimental errors, and the used learning algorithm is known as backpropagation. 
Then, in a second training epoch, the contribution from the chiral sum rules is added to 
the error function. As long as convolutions over the neural network output are non local 
constraints, the previous technique is no longer useful and another learning algorithm is needed, 
which is called genetic algorithms training. Now each of this techniques will be introduced, 
directing the interested reader to the standard reviews and textbooks [32] on neural networks 
and applications. 

Learning by backpropagation allows to train multilayer neural networks and has proved 
to be an excellent tool in classification, interpolation and prediction tasks. It is a standard 
technique that has been recently applied to data analysis in high energy physics, see Ref. [3]. 
The starting point is a set of input-output patters, 

(x",z) G R n x R m ,p = l,...,p , (48) 

that network must learn. In our case, each input-output pattern consists of a single data point, 
the input being the energy s and the output the spectral function pv-a(s)- 

The basis of the network learning is the error function, also known as fitness functional. 
The error function is defined as the difference between the actual and the desired output of the 
net, measured over the training set, and weighted with the experimental errors. It is given by 

Applying the gradient descent minimization procedure, that is, looking for the direction of 
steepest descent of the error function, the appropriate changes in the network parameters such 
that the error function decreases can be determined. The error is introduced in the units of 
the last layer by 

A (LU = [0i(f M) _ ^ > (50 ) 

and then backpropagated to the rest of the network 

Ap^j'^-^j^Af^, (51) 
i=i 

and the last step consistes on the update of the weights and thresholds of the net 

<M? = ~vt Ape< , - 1 >" + aSu,® (last) , (52) 

59? = -vfl A f )tX + "SO? (last) , (53) 

where rj is the learning rate parameter which controls the velocity of the training and the term 
with a is called a momentum term, which improves the algorithm so that the training does not 



25 



get stuck in a local minimum. The main advantage of this method is that it is deterministic 
and it has been used repeatedly in different situations, always successfully. 

As stated above, the second of the neural networks techniques that are used in our analysis 
is called genetic algorithm learning, also known as natural selection learning. As long as the 
chiral sum rules are convolutions of the network output, that is are non local functions on the 
error function, meaning that they depend not on a single network output but on the global 
output, the usual backpropagation training techniques are not useful in this second epoch of 
our training. Now the error functional has the form of eq. (|49|) but with additional convolutions 
of the neural network output 

E=J2 V ( J; J + £«><(/ dsh {p {net) (s)) - AA , (54) 
i=i er> y > i= i vo 'I 

where Ai is the theoretical value of the i— th sum rule and wi is the relative weight of this sum 
rule. In this case genetic algorithms are used, a training method inspired in the evolutionary 
theories in biology. In this method, the network parameters are transformed into bits in an ADN 
chain. These chains are replicated, and then some bits are mutated with a certain probability, 
and only those chains with the smallest contribution to the error functional survive. By analogy, 
this method is also known as natural selection learning. A simple scheme of the recursive process 
can be seen as follows: the starting piont is the set of parameters that define the neural network, 

. . .(2) fl (2) 

Creation of ADN chain 

Replication of ADN chains 
ADNi = (u,8> , w g>, . . . , 6® 0f\ . . .) , ADN 2 = (off, W g>, . . . , 6?, ef\ ...)... 
Random mutation of bits in the ADN chains 

ADNi = e?,e?\ . . .) , adn 2 = + s 2 JA\ ^ + ^\ ...)... 

Selection of best ADN chain 
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4 



ADNj = (uj[y + 5j 




...,9^ + 5,9?,...) 



New weights and thresholds and decrease of fitness 



This is a very simple genetic algorithm, more modifications could be added to improve its 
efficiency like crossing between individuals (characterized by an ADN chain), but our analysis 
showed that this was not necessary. Its main drawbacks are that it is random rather than 
deterministic as is the backpropagation algorithm, and that requires to carefully adjust many 
parameters (rate of mutations, size of the population). These parameters have been adjusted 
following two requirements: efficiency of the learning and stability of the result. Learning by 
genetic algorithms allows therefore to impose the theoretical constraints from the chiral sum 
rules in a natural way. 

Finally, we would like to comment on a new learning algorithm that implements the main 
advantages of the two methods, that is, it is deterministic and therefore the simulation time 
is smaller, but at the same time it supports non local contributions to the error function. 
During the realization of this work, a novel technique was developed that allowed to use the 
backpropagation learning algorithms in the case of eq. (p>fj) . when the error functional con- 
tains convolutions. This allowed to check that the results obtained with the genetic algorithm 
approach were correct. In brief, this technique consists in noticing that an integral can be de- 
termined up a any desired accuracy by a finite sum of local contributions, when in this context 
local means that only depends on one network output. In fact, this is what any numerical 
integration method does, so it is clear that training algorithms for backpropagation learning 
can be implemented. The result is that for a discretization of the integral of the form 



where the coefficients depend on the method, applying the usual backpropagation condition 
(variation of weights and thresholds in the direction of steepest descent of the error function) to 
the convolution term one finds that corresponding equations are the backpropagation equations 
but with eq. (fHUj) replaced by 



where for simplicity we have only considered one sum rule and z is a normalization factor 
present due to the fact that the inputs and outputs of the neural network are normalized, 
so that the activation function of the neurons are always within the sensibility range of the 
activation function. In eq. (f57)J) Ok means the output of the network when the input Xk is 
introduced, f(o(x)) is the convolution that we want the network to learn and A is its theoretical 
value. In this equation each term should be understood as a new pattern for the neural network 




(55) 




(56) 
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training. From here the usual backpropagation techniques apply as usual. This novel technique 
was implemented in the present analysis but it did not improve neither the quality nor the 
speed of the training, so the genetic algorithms technique was mantained for the training 
with convolutions. This technique, that is called backpropagation for convolutions, has many 
applications, and will be the subject of future work. 
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